-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using allreduce_avg to eliminate scale in auto parallel DP #61622
Using allreduce_avg to eliminate scale in auto parallel DP #61622
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
… using-allruduce-avg-to-eliminate-scale-in-auto-parallel-dp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 改一次要重跑 CI,单独提一个文档的 PR 吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- API暴露给官网的话,文档写的过于简单了,请参考英文模板
- 中文也同步补充
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好,后续另提PR补充。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
补充文档PR
英文:#62480
中文:PaddlePaddle/docs#6515
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
Others
Description
Pcard-76459
使用
allreduce_avg
算子替代数据并行中用于梯度同步通信的allreduce_sum + scale
模式,提升通信效率,同时让所有通信操作在网络中连续,避免scale计算算子影响sharding中的通信fusion优化策略。通过
auto_parallel.strategy.gradient_scale_using_allreduce_avg
参数控制开启,且仅在NCCL版本>=2.10时生效。在A800单机8卡环境下测试llama1-13B-PP2-SD4-GBS128-ACC32模型,性能提升约2%。此PR中配套进行了以下工作:
allreduce_avg
和reduce_avg
通信算子。nccl_version
API,用于判断nccl版本。PaddleNLP组网开关适配相关PR:PaddlePaddle/PaddleNLP#8021